library(readxl)
library(kableExtra)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(vistime)
library(treemap)
# require(devtools)
# install_github("lchiffon/wordcloud2")
library(wordcloud2)
library(flextable)
##
## Attaching package: 'flextable'
##
## The following object is masked from 'package:purrr':
##
## compose
##
## The following objects are masked from 'package:kableExtra':
##
## as_image, footnote
library(devtools)
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 4.3.3
devtools::install_github("https://github.com/Genentech/phase1b")
## Using GitHub PAT from the git credential store.
## Skipping install of 'phase1b' from a github remote, the SHA1 (9fbc3baf) has not changed since last install.
## Use `force = TRUE` to force installation
library(phase1b)
phase1b journey.Audrey Yeo
Opinions do not reflect employer
This presentation has ALT text and as much as possible, uses colour-blind friendly palettes
Code for this Quarto-rendered .html will also be shared
phase1bphase1b package history2015 : Started as a need in Roche’s early development group, package development led by Daniel Sabanés Bové in 2015.
2023 : Refactoring, Renaming, adding Unit and Integration tests as current State-of-Art Software Engineering practice.
100% written in R and Open Source.
Website : genentech.github.io/phase1b/
library(devtools)
devtools::install_github("https://github.com/Genentech/phase1b")
library(phase1b)
…the combined Scientific and Business development of a therapy.
graduated MSc Biostatistics in 2020
started at RWD at Roche in mid 2021
joined R&D at Roche in mid 2022
- Project Lead Statistician in early Oncology Trials
- Study Statistician for phase 1-2, phase 3 trialsfirst internal statistics presentation end 2022 on decision gating
started phase1b in July 2023
phase1b’s first external tour at PSI (pic later) and
useR! in 2024
Why find solutions when we can also build them ?
Mathematics is Elegant
Building software can be inclusive and collaborative
I can create delightful experiences users and bring everyone along : values of inclusion, building great products, having an impact
control = 0.6
thetaT_low = 0.6
result <- predprob(
x = 16, n = 23, Nmax = 40, p = control, thetaT = thetaT_low,
parE = c(0.6, 0.4)
)
thetaT_high = 0.9
result_high_thetaT <- predprob(
x = 16, n = 23, Nmax = 40, p = control, thetaT = thetaT_high,
parE = c(0.6, 0.4)
)
data = rbind(result$table, result_high_thetaT$table)
data$thetaT = c(rep("60%", 18), rep("90%", 18))
df_thetaTlow <- data %>% filter(thetaT == "60%")
df_thetaThigh <- data %>% filter(thetaT == "90%")
ggplot(df_thetaTlow) +
geom_point(aes(x = cumul_counts, y = posterior, colour = success)) +
scale_x_discrete(limits = c(seq(23, 40, by = 1))) +
xlab("\nFuture successful reponders") +
ylab("Probability\n") +
ggtitle("With P (RR > 0.6 | data ) > 60% : \n32 of 40 responders needed to achieve a Go decision") +
theme(text = element_text(size = 20)) +
scale_color_manual(values = c("#D55E00", "#009E73"))
## Warning in scale_x_discrete(limits = c(seq(23, 40, by = 1))): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
Predictive Posterior CDF for different Efficacy Rules
roxygen2 introduction another resourcein summary, gaps were : Git merging, Pull Requests, Styling, Debugging, Writing Tests
Knowledge and skills for Statistical Software Engineering!
Git kraken,
VS Code Git Graph Extensionpre-commit, Git hub checks,
R CMDroxygen2), building, reviewing
documentationtestthat and
checkmate)styler, prettier … building your
own styleknitr::include_graphics("images/Peranakan_bowl.jpg", error = FALSE)
knitr::include_graphics("images/typed_asserts.png", error = FALSE)
Roxygen skeleton example for one user-facing function in phase1b
typed defined from package
roxygen2 and it’s type asserted from package
checkmatetestthat and
checkmate are used herephase1b a State of Art Software1 - reproducible,
robust, testable, intuitive and open to collaborationlibrary(roxygen2)
library(devtools)
roxygen2::roxygenise() # converts roxygen comments to .Rd files
devtools::document() # R converts .Rd files to human readable documentation
# then Ctrl + Shift + D, if you’re using RStudio.
To find a needle in the haystack systematically
embrace small steps as smaller PRs can be sizeABLE.
# | echo = false
# | eval = false
debugonce()
debug()
undebug()
options(error = recover)
options(error = NULL)
One Design Document for entire function
One issue is the smallest task from (1)
One issue to one branch
One issue to one Pull Request (PR)
Create Test files for helper functions
Create Test files and example files for main function calls
knitr::include_graphics("images/Design_doc_ocPost.png")
User-facing functions start with Design-document
Helps achieve Clarity on the form and purpose of the user-facing function
Test regular and edge cases
Makes the rest of the work “easier” when the goals are clear
Most of the “skeleton” and “flesh” of the work can already be done in the because of 1-3
First Pull Request (PR) merged on Aug 29 2023 🥳
submitted an abstract by November 2023 at PSI
knitr::include_graphics("images/abstract.png")
phase1b journey in Pull Requestsdata <- read.csv("phase1b_12mo_PR.csv", sep = ",")
data <- data[1:20,]
showcase_PR <- data.frame(`Pull request`= data$Pull.request,
Order= data$order)
names <- c("Pull Request", "Order of work")
# dimnames(showcase_PR) = list(1:32, names)
dimnames(showcase_PR) = list(1:20, names)
showcase_PR %>% kbl(align = "c") %>% kable_styling(font_size = 14) %>% kable_classic(lightable_options = "hover", html_font = "\"Source Sans Pro\", helvetica, sans-serif")
| Pull Request | Order of work |
|---|---|
| test for dbetabinom | 1 |
| pbetaMix and qbetaMix | 2 |
| postprob | 3 |
| ocPostprob | 4 |
| betadiff | 5 |
| postprobDist | 6 |
| getBetaMix and dbetaMix | 7 |
| predprob | 8 |
| Design doc for predprobDist | 9 |
| h_predprobdist_single_arm | 10 |
| h_predprobdist | 11 |
| predprobDist | 12 |
| h_get_decisionDist | 13 |
| h_getbetaMixpost | 14 |
| Design-doc_ocPredprob | 15 |
| h_get_decision_one_Predprob | 16 |
| h_get_decision_two_Predprob | 17 |
| h_get_oc | 18 |
| ocPredprob | 19 |
| Design-doc_ocPredprobDist | 20 |
phase1b journey in Commitsdata <- read.csv("phase1b_12mo_PR.csv", sep = ",")
# treemap 2
treemap(data,
index ="order",
vSize="commits",
type="index",
mirror.y = TRUE,
title = "Commits decrease with later PRs",
algorithm = "pivotSize",
fontfamily.title = "helvetica",
fontfamily.labels = "helvetica",
border.lwds = 0.5
)
phase1b journey in Chatsdata <- read.csv("phase1b_12mo_PR.csv", sep = ",")
# treemap 2
treemap(data,
index ="order",
vSize="chats",
type="index",
mirror.y = TRUE,
title = "Chats decrease with later PRs",
algorithm = "pivotSize",
fontfamily.title = "helvetica",
fontfamily.labels = "helvetica",
border.lwds = 0.5
)
Goal : introduce good practices and improve muscle memory and collaborate
debug(), undebug(),
options(recover = error),
options(recover = NULL)Goal : introduce good practices and improve muscle memory and collaborate
phase1b work
Positive bias & Trust are win-win situations
Learning through mistakes is key
Safe space to be ask any questions, and iterate, even for a seasoned engineer
“Great motivation to learn new skills”
“… it was always fun and easy to work with Audrey”
Go through several more issues to complete and make it more delightful
submission to CRAN
Collaborate by contacting me
More software work for statisticians with values of : Inclusion, Diversity, Impact and Delightful user experience
As in Statistics, no matter how much you’ve improved it’s not permanent
knitr::include_graphics("images/hex6.png")
knitr::include_graphics("images/hex3.png")
knitr::include_graphics("images/hex9.png")
knitr::include_graphics("images/hex4.png")
knitr::include_graphics("images/hex5.png")
Daniel Sabanés Bové
…and other colleagues at Data Science Acceleration at Roche
Open Source community and R community that do great work and share their knowledge
I’d love to know how this presentation relates to you or does not !
Thall P F, Simon R (1994), Practical Guidelines for Phase IIB Clinical Trials, Biometrics, 50, 337-349
Lee J J, Liu D D (2008), A Predictive probability design for phase II cancer clinical trials, 5(2), 93-106, Clinical Trials
Yeo, A T, Sabanés Bové D, Elze M, Pourmohamad T, Zhu J, Lymp J, Teterina A (2024). Phase1b : Calculations for decisions on Phase 1b clinical trials. R package version 1.0.0, https://genentech.github.io/phase1b
Code for this presentation
Code with engineering. Microsoft
How to do a code review. Google
Pachecho, C, A Technical Journey into API Design-First: Best Practices and Lessons Learned link
Why We need to Improve Software Engineering in Biostatistics (October 26 2023 R/Pharma) link
Inclusive Speaker Course by Linux Foundation link
Subject to a change in definition with better tools and practices↩︎